Scaling Relationships in Back-Propagation Learning: Dependence on Training Set Size

نویسنده

  • Gerald Tesauro
چکیده

We stud y th e amount of ti me needed to learn a fixed t raining se t in the "back-pro pagation" proced ure for learning in multi-layer neural network models. The task chosen was 32-bit parity, a highorder funct ion for wh ich memorization of specific inpu t-output pairs is necessary. For small t raining sets , the learning time is consistent with a ~-power law depen dence on the number of patterns in the t ra ining set. For larger training set s, t he learn ing t ime dive rges at a critical tra ining set size which appears to be related to the storage capacity of the network. There is now widespread interest in devising adaptive learning proce dures for massively-p arallel networks of neuron-like comput ing elements [1,2,10 ,11,12,17]. A prime example is the "back-p ropagat ion" learn ing procedure [8,9,6] for mu lti-l ayer feed-forward network s. In this procedure, a set of patterns is presented to the input layer of the network, and the networ k's output is computed by feed-forward pr opagation of the input signal. An error function is then obtained by calculating the mean squar ed difference between the network 's output and the des ired output for each pattern . T he connection strengths, or weights, of the network are then modified so as to minimize the erro r fun cti on according to a grad ient-descent rul e. This algorithm has displayed impress ive perfor man ce for small-scale prob lems, and it is now of great interest to determine how it will scale to larger , more d ifficult prob lems. T he question of sca ling can be approached from several different dir ect ions. Typically one wou ld ask how some measure of the network 's performance (such as the t raining t ime , or the fraction of pat terns classified correctly) scales wit h a paramete r describing the network , the task, or t he learning algorithm. Basic parameters describing the network include: the numb er of input units n, the number of hidden units h, and the number of layers 1. Related paramet ers include: the total number of possibl e inpu t patterns (2" for binary inp uts), and the storage capacity of the netwo rk, © 1987 Complex Systems Publications, Inc.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Learning Scaling is Predictable, Empirically

Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, com...

متن کامل

Neural Network Performance Analysis for Real Time Hand Gesture Tracking Based on Hu Moment and Hybrid Features

This paper presents a comparison study between the multilayer perceptron (MLP) and radial basis function (RBF) neural networks with supervised learning and back propagation algorithm to track hand gestures. Both networks have two output classes which are hand and face. Skin is detected by a regional based algorithm in the image, and then networks are applied on video sequences frame by frame in...

متن کامل

Estimation of pull-in instability voltage of Euler-Bernoulli micro beam by back propagation artificial neural network

The static pull-in instability of beam-type micro-electromechanical systems is theoretically investigated. Two engineering cases including cantilever and double cantilever micro-beam are considered. Considering the mid-plane stretching as the source of the nonlinearity in the beam behavior, a nonlinear size-dependent Euler-Bernoulli beam model is used based on a modified couple stress theory, c...

متن کامل

Estimation of pull-in instability voltage of Euler-Bernoulli micro beam by back propagation artificial neural network

The static pull-in instability of beam-type micro-electromechanical systems is theoretically investigated. Two engineering cases including cantilever and double cantilever micro-beam are considered. Considering the mid-plane stretching as the source of the nonlinearity in the beam behavior, a nonlinear size-dependent Euler-Bernoulli beam model is used based on a modified couple stress theory, c...

متن کامل

The Pro t of Learning Exceptions

For many classiication tasks, the set of available task instances can be roughly divided into regular instances and exceptions. We investigate three learning algorithms that apply a diierent method of learning with respect to regularities and exceptions, viz. (i) back-propagation, (ii) cascade back-propagation (a constructive version of back-propagation), and (iii) information-gain tree (an ind...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Complex Systems

دوره 1  شماره 

صفحات  -

تاریخ انتشار 1987